阅读理解是一个复杂的认知过程,涉及许多人类大脑活动。大量作品研究了在信息检索相关方案中阅读理解的模式和注意力分配。但是,关于阅读理解过程中人脑中发生的事情以及这些认知活动如何影响信息检索过程,知之甚少。此外,随着脑成像技术(例如脑电图(EEG))的进步,几乎可以实时收集大脑信号,并探索是否可以用作反馈来促进信息获取性能。在本文中,我们仔细设计了一项基于实验室的用户研究,以调查阅读理解过程中的大脑活动。我们的发现表明,神经反应随着不同类型的阅读内容而变化,即可以满足用户信息需求和无法无法满足的内容的内容。我们建议在阅读理解过程中以微观时间量表以微观时间量表来支持各种认知活动,例如认知负载,语义主题理解和推论处理。从这些发现中,我们说明了一些有关信息检索任务的见解,例如排名模型构建和界面设计。此外,我们建议有可能检测主动现实世界系统的阅读理解状态。为此,我们为基于脑电图的阅读理解建模(UERCM)提出了一个统一的框架。为了验证其有效性,我们基于脑电图特征进行了大量的实验,以进行两项阅读理解任务:回答句子分类和回答提取。结果表明,通过大脑信号提高两个任务的性能是可行的。
translated by 谷歌翻译
智能城市的智能交通灯可以最佳地减少交通拥堵。在这项研究中,我们采用了加强学习,培训了城市移动模拟器的红绿灯的控制代理。由于现有工程的差异,除了基于价值的方法之外,利用基于策略的深度加强学习方法,近端策略优化(PPO),例如Deep Q网络(DQN)和双DQN(DDQN)。首先,将获得PPO的最佳政策与来自DQN和DDQN的PPO相比。发现PPO的政策比其他政策更好。接下来,而不是固定间隔的流量光阶段,我们采用具有可变时间间隔的光相位,这导致更好的策略来传递流量流。然后,研究了环境和行动干扰的影响,以展示基于学习的控制器是强大的。最后,我们考虑不平衡的交通流量,并发现智能流量可以适度地对不平衡的流量方案执行,尽管它仅从平衡流量方案中了解最佳策略。
translated by 谷歌翻译
具有高级别规格的自治系统的运动规划具有广泛的应用。然而,涉及定时时间逻辑的正式语言的研究仍在调查中。此外,许多现有结果依赖于用户指定的任务在给定环境中可行的关键假设。当操作环境是动态和未知的挑战时,由于环境可以找到禁止,导致预先定时定时任务无法完全满足潜在冲突的任务。在考虑时间束缚要求时,这些问题变得更具挑战性。为了解决这些挑战,这项工作提出了一种控制框架,其考虑了强制限制来强制执行安全要求和软限制,以启用任务放松。使用度量间隔时间逻辑(MITL)规范来处理时间限制约束。通过构建轻松的定时产品自动机,在线运动规划策略与后退地平线控制器合成以产生政策,以减少优先顺序的降低方式实现多重目标1)正式保证了对硬安全限制的满足感; 2)主要满足软定时任务; 3)尽可能收集时变奖励。放松结构的另一个新颖性是考虑违反时间和任务的不可行情况。提供仿真结果以验证所提出的方法。
translated by 谷歌翻译
本文研究了Markov决策过程(MDP)建模的自主动态系统的运动规划,在连续状态和动作空间上具有未知的过渡概率。线性时间逻辑(LTL)用于指定无限地平线上的高级任务,可以转换为具有几种接受集的极限确定性广义B \“UCHI Automaton(LDGBA)。新颖性是设计嵌入式产品MDP(通过结合同步跟踪 - 前沿函数来记录自动化的同步跟踪 - 前沿函数,并促进接受条件的满足感。基于LDGBA的奖励塑造和折扣方案的模型的满足 - 免费加强学习(RL)仅取决于EP-MDP状态,并可以克服稀疏奖励的问题。严格的分析表明,任何优化预期折扣返回的RL方法都保证找到最佳策略,其迹线最大化满意度概率。然后开发模块化深度确定性政策梯度(DDPG)以在连续状态和行动空间上生成此类策略。我们的f Ramework通过一系列Openai健身房环境进行评估。
translated by 谷歌翻译
本文研究了运动和环境不确定性的最佳运动规划。通过将系统建模作为概率标记的马尔可夫决策过程(PL-MDP),控制目标是合成有限内存策略,在该策略下,该代理满足具有所需满足的线性时间逻辑(LTL)的高级复杂任务可能性。特别地,考虑了满足无限地平线任务的轨迹的成本优化,分析了降低预期平均成本和最大化任务满意度概率之间的权衡。而不是使用传统的Rabin Automata,LTL公式被转换为限制确定性的B \“UCHI自动机(LDBA),其具有更直接的接受条件和更紧凑的图形结构。这项工作的新颖性在于考虑案件LTL规范可能是不可行的,并且在PL-MDP和LDBA之间的轻松产品MDP的开发可能是不可行的和开发。放松的产品MDP允许代理在任务不完全可行的情况下进行修改其运动计划,并量化修订计划的违规测量。然后配制多目标优化问题,共同考虑任务满意度的概率,违反原始任务限制的违规以及策略执行的实施成本,通过耦合的线性计划解决。据最好我们的知识,它是第一个弥合规划修订版和计划前缀和计划的最佳控制合成之间的差距的工作在无限地平线上修复代理轨迹。提供实验结果以证明所提出的框架的有效性。
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.
translated by 谷歌翻译
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译